12-SD Samantaray
نویسندگان
چکیده
SUMMARY Consider a regression problem in which there are many more explanatory variables than data points, i.e., p >> n. Essentially, without reducing the number of variables inference is impossible. So, we group the p explanatory variables into blocks by clustering, evaluate statistics on the blocks and then regress the response on these statistics under a penalized error criterion to obtain estimates of the regression coefficients. We examine the performance of this approach for a variety of choices of n, p, classes of statistics, clustering algorithms, penalty terms, and data types. When n is not large, the discrimination over number of statistics is weak, but computations suggest regressing on approximately [n/K] statistics where K is the number of blocks formed by a clustering algorithm. Small deviations from this are observed when the blocks of variables are of very different sizes. Larger deviations are observed when the penalty term is an L q norm with high enough q. as the least squares estimator of β, provided the inverse exists. If |X'X| is small, the inverse is large in the sense that some of its eigenvalues must be large. When p > n, X is n × p, i.e., short and fat. For Short Fat Data (SFD) |X'X| = 0 so its inverse fails to exist. The central issue here is that the mean function for Y, EY, is in a space of dimension p while only n < p data points are available. That is, the SFD or large p, small n problem would disappear if we had more data. However, even though one can imagine arbitrarily large ns, in practice they do not exist. Alternatively, we can try to do effective dimension reduction by regressing Y on functions of the X i s. The idea is that if we evaluate a comparatively small number of suitably chosen functions on each X i , i.e., features, and then do penalized regression on those features we will have retained all the information in the data about the response Y. The question is what kind of statistics
منابع مشابه
Microcontroller Based Implementation of a Fuzzy Knowledge Based Controller Debasmita Pattnaik (109ee0298) Bonani Sahu (109ee0302) Devadutta Samantaray (109ee0061) Department of Electrical Engineering National Institute of Technology, Rourkela
5
متن کاملA simple & rapid Dot-ELISA dipstick technique for detection of antibodies to Entamoeba histolytica in amoebic liver abscess.
متن کامل
Correction: Use of molecular markers in identification and characterization of resistance to rice blast in India
[This corrects the article DOI: 10.1371/journal.pone.0176236.].
متن کاملOptimal PID Insulin Injection Control For Blood Glucose Regulation in IDDM Patient
This paper address the design of output feedback PID controller to deliver insulin via an implantable micro insulin dispenser for insulin dependent diabetes mellitus (IDDM) patients. For synthesis of the controller, a 9 order linear state space model of the multivariable nonlinear dynamic glucose insulin process of the IDDM patient has been used. The performance of the resulting controller was ...
متن کاملoro - genital contact : case report . meningitidis group A acquired by Acute urethritis due to Neisseria
tics without improvement. We treated her successfully with local applications of piperazine solution for only eight days, and more than 18 months elapsed since without any recurrence. In chronic cases of sterile pyuria, therefore, urine should be examined for parasites. If the ova or larvae of E vermicularis are found treatment should be by local irrigation ofthe urethra and bladder and vaginal...
متن کامل